Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🎮 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
579
posts in
59.4
ms
Sampling-Based Safe
Reinforcement
Learning
♟️
Game Theory
arxiv.org
·
23h
GRIP-VLM:
RL
for Efficient Vision-Language Models
💬
LLMs
startuphub.ai
·
6d
Scaling
Reinforcement
Learning
at Applied Compute
🤖
AI Agents
modal.com
·
1d
PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-Play
🤖
AI Agents
vmax.ai
·
5h
·
Hacker News
https://odyssey.ml/research
🤖
AI Agents
odyssey.ml
·
1d
Learning
Systems and Innate Behavior
🤖
AI Agents
costa-and-associates.com
·
10h
·
Hacker News
Reinforcement
Learning
: An Introduction (2nd Edition)
📐
ML Theory
chizkidd.github.io
·
5d
How Auto Transport Companies Are Leveraging AI for Precision Logistics
🤖
Machine Learning
haulin.ai
·
22h
·
DEV
The Safety Paradox: How
RLHF
Creates the AI Psychosis Problem It’s Meant to Prevent
📡
Information Theory
promptinjection.net
·
2d
·
Hacker News
Long Context Pre-Training w/ Lighthouse Attention
💬
LLMs
mail.bycloud.ai
·
1d
Cursor bets on cheaper coding with Composer 2.5 and Kimi K2.5
🛠️
Developer Tools
thenewstack.io
·
16h
inclusionAI/Ring-2.6-1T
🕸️
WebAssembly
huggingface.co
·
6d
·
Hacker News
,
r/LocalLLaMA
Training SID-1 to beat GPT-5 at search with 1k+ QPS
RL
🔍
RAG
turbopuffer.com
·
1d
·
Hacker News
奥赛金牌打包成两步配方
🤖
Machine Learning
ai-brief.liziran.com
·
3d
A lock proves the security of the room and not that the room is empty
🤖
AI Agents
github.com
·
2d
·
Hacker News
Cursor launches Composer 2.5 model for long-running AI coding tasks at cheaper token cost
🏢
Software Industry
indianexpress.com
·
1d
[MIT] RLCR: Teaching AI models to say "I'm not sure"
🤖
Machine Learning
csail.mit.edu
·
6d
·
r/LocalLLaMA
Massachusetts' Institute of Technology Introduction to Deep
Learning
🧠
Neural Networks
i-programmer.info
·
1d
Beyond Action Residuals: Real-World Robot
Policy
Steering via Bottleneck Latent
Reinforcement
Learning
💬
LLMs
arxiv.org
·
23h
rl
for red teaming: training models to attack and defend themselves
🤖
AI Agents
castform.com
·
6d
·
Hacker News
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help